[Do Not Merge] Avoid subscriber re-fire when storage writes retry#792
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: de62ae5eae
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const {keyValuePairsToStore, newData} = params; | ||
|
|
||
| return Storage.multiSet(keyValuePairsToStore) | ||
| .catch((error) => OnyxUtils.retryOperation(error, persistMultiSetWrite, params, retryAttempt)) |
There was a problem hiding this comment.
Retry orchestrator after eviction to restore cache state
When Storage.multiSet fails with a capacity error, retryOperation() evicts an LRU key via remove(), which immediately drops that key from cache and notifies subscribers before retrying. Retrying persistMultiSetWrite (and similarly persistCollectionWrite / persistMergedCollectionWrite) only re-attempts the storage write, so if the evicted key is part of the in-flight write, its cache value is never restored and subscribers keep the stale “removed” state even though storage eventually succeeds. This is a regression from the previous behavior where retries re-ran the orchestrator and re-applied cache updates.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 11bc0a6.
Fix: each orchestrator (multiSetWithRetry, setCollectionWithRetry, partialSetCollection, mergeCollectionWithPatches) now constructs a restoreEvictedKey(key) closure that re-applies its cache write + subscriber notification for a single in-flight key. The closure is passed via the write helper's params; retryOperation invokes it after remove(keyForRemoval) and before re-entering the write helper. No-op when the evicted key isn't part of this write (the typical case).
mergeCollection nuance: the closure snapshots the cache's post-merge state (not just the delta) so previously-merged-in fields survive when the eviction's cache.drop forces us to re-populate cache from scratch — otherwise we'd lose fields like {id: 1} when restoring after a mergeCollection({key: {value: 'new'}}) retry.
Regression coverage: added 3 tests under storage eviction — one per write helper — that mock Storage.multiSet/Storage.multiMerge to fail with database or disk is full once on a setup where the LRU evictable key is the same key being written, and assert both cache + subscriber reflect the restored value (not undefined). All 3 fail without the lib changes (Received: undefined / missing fields) and pass with them.
459/459 tests pass; tsc and prettier clean.
Addresses Codex P1 on Expensify#792 review: Expensify#792 (comment) When `retryOperation()` evicts an LRU key via `remove()` on a storage capacity error, `remove()` immediately drops the key from cache and fires `keyChanged(undefined)`. With the orchestrator/write-helper split introduced in this PR, the retry only re-attempts the storage write -- it does not re-apply cache state. So if the evicted key is part of the in-flight write, cache + subscribers stay in the "removed" state even though the storage retry eventually succeeds, leaving the App's view of the world inconsistent with persisted data. Fix: each orchestrator now constructs a `restoreEvictedKey(key)` closure that re-applies its cache write + subscriber notification for a single in-flight key, and passes it via the write helper's params. `retryOperation` invokes the closure after `remove(keyForRemoval)`, before re-entering the write helper. The closure is a no-op when the evicted key isn't part of this write (the typical case where some unrelated evictable key gets selected). For mergeCollection specifically, the closure snapshots the cache's post-merge state for in-flight keys (rather than just the merge delta) so that previously-merged-in fields are preserved when the eviction's cache.drop forces us to re-populate cache from scratch. Adds 3 regression tests under `storage eviction`: - `multiSet` -- LRU evictable is the in-flight key (persistMultiSetWrite path) - `setCollection` -- same scenario via persistCollectionWrite - `mergeCollection` -- same scenario via persistMergedCollectionWrite, asserts the previously-existing `id` field survives the eviction All 3 tests fail with `Received: undefined` / missing fields if the lib changes are reverted; pass with them in place. All 459 unit tests pass (456 existing + 3 new).
|
i will review it soon |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e70af5e to
bd0b1e9
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bd0b1e9819
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Details
Follow-up to #787. After the cache-first/storage-second refactor of
mergeCollectionWithPatches, the storage-retry path was identified as re-notifying subscribers once per retry attempt —waitForCollectionCallbacksubscribers fire on everykeysChangedcall by contract, so a transient storage error could surface 2-6 callback invocations for a single logical write. Same shape applies tomultiSetWithRetry,setCollectionWithRetry, andpartialSetCollection— all four flow throughretryOperation, which re-enters the orchestrator on retry.setWithRetrywas already safe becausebroadcastUpdateshort-circuits onhasChanged === false.Fix shape
Two small pieces:
!retryAttemptguard around the subscriber notification in each of the four functions.cache.set/mergestays unguarded — it's idempotent, runs every attempt, and is what restores cache state afterretryOperation's eviction branch drops a key. The notification (keysChanged/keyChanged) is the one piece that's not safe to re-fire.skipNotifyflag onremove()+ passingtruefromretryOperation's eviction branch. Without this, eviction-during-retry would firekeyChanged(undefined)and strand subscribers in the "removed" state, since the orchestrator no longer re-fireskeysChangedon retry.cache.dropandStorage.removeItemstill run — only the spuriouskeyChanged(undefined)is skipped, and the imminent retry'scache.setcovers cache restoration.That's the entire diff: +157 / -9 across
lib/OnyxUtils.tsandtests/unit/onyxUtilsTest.ts.Related Issues
No separate GH issue — this PR addresses #787 review comment directly.
Linked E/App PR
Expensify/App#91626
Automated Tests
Added a new
retry side-effect idempotencydescribe block intests/unit/onyxUtilsTest.tswith one test per affected function. Each seeds awaitForCollectionCallback: truesubscriber, mocksStorage.multiSet(orStorage.multiMergeformergeCollection) to reject once then resolve, runs the write, and asserts the subscriber was called exactly 1 time across the retry chain:mergeCollection — waitForCollectionCallback subscriber fires once across retriesOnyx.multiSet — collection subscriber fires once across retriesOnyx.setCollection — collection subscriber fires once across retriesOnyxUtils.partialSetCollection — collection subscriber fires once across retriesAll 4 fail with
Received number of calls: 2if the!retryAttemptguard is reverted in any of the four orchestrators — confirmed locally viagit stashoflib/OnyxUtils.ts.All 6 existing storage eviction tests still pass without modification, validating that the
skipNotifychange toremove()preserves the eviction recovery path: the imminent retry'scache.setre-populates the cache for any evicted in-flight key, and subscribers never see a transientundefinedbecausekeyChanged(undefined)no longer fires from the eviction branch.Local results:
npx jest— 455 / 455 pass across 17 suites.npx tsc --noEmit— clean.Manual Tests
The four refactored paths (
mergeCollectionWithPatches,multiSetWithRetry,setCollectionWithRetry,partialSetCollection) are reached from the App via Pusher events, theOpenAppresponse, everyOnyx.updatebatch that contains aMERGE_COLLECTION/MULTI_SET/SET_COLLECTIONop, LHN refresh, Search filters, chat sends, mark-all-as-read, hold/unhold, workspace switching, etc. The companion App PR Expensify/App#91626 pinsreact-native-onyxto this branch's head for end-to-end exercise; the steps below are the same protocol used for #787.Setup
elirangoshen/fix/retry-side-effects-idempotencyoncallstack-internal/Expensify-App), which bumpspackage.json'sreact-native-onyxto this branch's head.npm installunder Node 20.20.0, thennpm run web.Functional smoke — no regression in the refactored paths
The refactor preserves the cache-first invariant from #787, so all of these should look indistinguishable from
mainto the user:main. (exercisesmergeCollectionWithPatchesvia Pusher /OpenApp)reportActions_), confirms via Pusher, persists after reload. (mergeCollectionWithPatches)mergeCollectionWithPatches)mergeCollectionWithPatches+setCollectionWithRetry)mergeCollectionWithPatches)mergeCollectionWithPatches)mergeCollectionWithPatches)Storage-failure simulation — core regression guard for this PR
This is the test that exercises the retry path the fix targets. With the fix in place, retries on storage failure should produce one subscriber notification per logical operation (not one per retry attempt), and brand-new keys should stay routed through
multiSeteven when an earliermultiMergeretry kicked in.OnyxDB).mergeCollection-driven action — switch workspaces, send a chat message, or apply a search filter.retryOperationkicks in. Console shows storage error logs and retry attempts (look forFailed to save to storage. Error: ... retryAttempt: N/5inLogger.logInfo), but no white screen, no stale UI, no data loss within the session, and no duplicatewaitForCollectionCallbacksubscriber invocations (verify via React DevTools "Highlight updates when components render" or via aconsole.countinstrumented in a temporaryuseOnyxcallback if you want to be precise).multiSet-driven action (least common — typically a deep-link prep) and asetCollection-driven action (Search filter result population) to cover the other refactored paths.Onyx.connect({waitForCollectionCallback: true})callbacks firing twice in DevTools traces, or (b) on the routing bug specifically, no observable user-facing difference on IDB but the issue is fixed under the hood for non-IDB backends.Cold-cache merge — preserves the pre-warm fast path from #787
mainsession — the cache pre-warm path (existing inmergeCollectionWithPatches) is untouched by this PR.Author Checklist
### Related Issuessection above### Linked E/App PRsection above, and verified this change against it (E/App CI passed and manual testing completed) (companion PR [No QA] Pin react-native-onyx to PR #792 head for retry-idempotency end-to-end testing App#91626 pins this branch's HEAD for end-to-end exercise)TestssectiontoggleReportand notonIconClick)myBool && <MyComponent />.STYLE.md) were followedAvatar, I verified the components usingAvatarare working as expected) (459/459 unit tests green)persistCollectionWriteis shared betweensetCollectionWithRetryandpartialSetCollectionsince their storage tails are byte-identical)/** comment above it */(N/A)thisproperly so there are no scoping issues (i.e. foronClick={this.submit}the methodthis.submitshould be bound tothisin the constructor) (N/A)thisare necessary to be bound (i.e. avoidthis.submit = this.submit.bind(this);ifthis.submitis never passed to a component event handler likeonClick) (N/A)Avataris modified, I verified thatAvataris working as expected in all cases) (pending companion App PR #91626)mainbranch was merged into this PR after a review, I tested again and verified the outcome was still expected according to theTeststeps. (N/A — no merge from main yet)Screenshots/Videos
Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
Screen.Recording.2026-05-26.at.14.41.29.mov
Screen.Recording.2026-05-26.at.14.42.06.mov
Screen.Recording.2026-05-26.at.14.42.27.mov
Screen.Recording.2026-05-26.at.14.43.07.mov
Screen.Recording.2026-05-26.at.14.44.17.mov
Screen.Recording.2026-05-26.at.14.44.48.mov